40927
1060
2180
| Description | Variable Name | Value |
|---|---|---|
| Module | 1, Preprocessing & QC | |
| Date | Sys.time() | 2021-09-21 15:39:16 |
| Input Cell Ranger | input_data | 16-Ochocka_2021_murine_GBM_immune-mctrl1, 17-Ochocka_2021_murine_GBM_immune-mctrl2, 18-Ochocka_2021_murine_GBM_immune-fctrl1, 19-Ochocka_2021_murine_GBM_immune-fctrl2, 20-Ochocka_2021_murine_GBM_immune-mtumor1, 21-Ochocka_2021_murine_GBM_immune-mtumor2, 22-Ochocka_2021_murine_GBM_immune-ftumor1, 23-Ochocka_2021_murine_GBM_immune-ftumor2 |
| gene/cell Upper Limit (N/cell) | gene.upperlimit | 9000 |
| gene/cell Lower Limit (N/cell) | gene.lowerlimit | 200 |
| Filtered by unmatched reads | unmatched.rate.filter.flag | TRUE |
| Mitochondrial Content Upper Limit (%/cell) | mt.upperlimit | 10 |
| Data Subsampled | subsampled | FALSE |
| Cluster Resolution | cluster.resolution | 1 |
| Filtered by organism | organism.filter.flag | FALSE |
| Number of Workers | normScale | 1 |
| Max Memory (Gb) | max.memory | 200 |
| Variables regressed out | vars2regress | percent.mt |
| Input Species | all_input_organisms | Mm, Mm, Mm, Mm, Mm, Mm, Mm, Mm |
| Feature selection method | feature.select.method | hvg |
| Scaling method | scale.method | sct |
| PCA component selection method | pca.component.select.method | cum_var |
| Normalization Method | norm_method | SCT |
| PCA Method | pca.method | pca |
| Principal Components Included | nDim | 28 |
| Unmatched rate upper limit | unmatch.high | 1 |
| Unmatched rate lower limit | unmatch.low | 0 |
| nPCA components used | nDim | 28 |
| Run Time (s) | elapsed.time | 566.02 |
| Run Identifier | run.id | R643_M01_NM2 |
| User | user | NM2 |
| Central Log Updated | clog.update.success | TRUE |
| Output Results (.Rdata) | save.filename | R643_M01_NM2_Ochocka2021_murineGBMimmune_210921.Rdata |
| Output Directory | save.directory | Preprocessed_Datasets/ |
| Output Saved | save.flag | TRUE |
---
title: "QC and Preprocessing"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
source_code: embed
theme: flatly
navbar:
- { title: "scPipeline", href: "https://github.com/NMikolajewicz/scPipeline" }
- { title: "scMiko", href: "https://github.com/NMikolajewicz/scMiko" }
editor_options:
chunk_output_type: inline
knit: (function(inputFile, encoding) {
rmarkdown::render(input = inputFile,
encoding = encoding,
output_file = if (exists("user")){paste0(
xfun::sans_ext(inputFile), '_',user, "_",
paste0(format(Sys.time(), "%d_%b_"), gsub(" ", "_", gsub(":", "_", format(Sys.time(), "%X"))), format(Sys.time(), "_%Y")), '.html'
)} else {paste0(xfun::sans_ext(inputFile), '_',"Guest", "_",
paste0(format(Sys.time(), "%d_%b_"), gsub(" ", "_", gsub(":", "_", format(Sys.time(), "%X"))), format(Sys.time(), "_%Y")), '.html')},
output_dir = if (exists("data.path")) paste0(data.path, "/HTML_Reports") else NULL
)
})
---
```{r setup, include=FALSE}
# clear global enviroment
rm(list = setdiff(ls(), c("data.path", "user")))
invisible({gc()})
# initiate timer
start.time <- proc.time()
# list of packages to load
packages2load <- c('viridis', 'scMiko', 'ggpubr', 'dplyr', 'WGCNA', 'cowplot', 'schex', 'Seurat', 'sctransform', 'plyr', 'tidyr', 'reshape2', 'Matrix', 'RColorBrewer', 'ggplot2', 'gridExtra', 'DT', 'flexdashboard', 'future', 'gplots', 'reticulate')
# load packages
invisible(lapply(packages2load, library, character.only = TRUE, quietly = T))
```
```{r c2 - parameter_specification}
# specify parameters
parameter.list <- list(
which.data = c("16-O", "17-O", "18-O", "19-O", "20-O", "21-O", "22-O", "23-O"),
which.strata = NA, #"wt"
organism.filter.flag = F,
organism.include = "Mm",
save.flag = T, # OPTIONAL; logical (default = T) # save results
save.filename = "Ochocka2021_murineGBMimmune_210921.Rdata", # note that prefix containing module number and run ID is appended to save file.
save.directory = "Preprocessed_Datasets/",
cluster.resolution = 1,
gene.upperlimit = 9000, #9000
gene.lowerlimit = 200, #200
mt.upperlimit = 10, #60
unmatched.rate.filter.flag = T,
vars2regress = c("percent.mt"), # optional: "dataset" (for linear batchcorrection)
correct.artifact = T,
feature.select.method = "hvg", # options: hvg (recommended), deviance
scale.method = "sct", # option: sct (recommended), null_residuals
pca.method = "pca", #option: pca (recommended), glmpca
pca.component.select.method = "cum_var", #options" cum_var, elbow
pca.var.threshold = 0.9, # [0,1], ignored if pca.component.select.method = elbow
conserve.memory = F,
update.log = T,
save.pdf = T,
print.inline = F
)
# save.pdf <- T
n.workers <- list(
normScale = 1
)
max.memory <- 200 # numeric, in terms of Gb (set to 100 for cluster)
```
```{r get input data paths}
if (!exists("data.path") & !exists("user")) {
stop("data.path and user variables do not exist in global enviroment. Ensure that these are specified in .Rprofile. If specified, try restarting Rstudio.")
}
# get input data files paths
inputDataSpecification <- function(inputs = "M01_inputs.csv", subsets = "M01_subsets.csv"){
input.data <- read.csv(inputs, stringsAsFactors = F)
colnames(input.data) <- rmvCSVprefix(colnames(input.data))
input.data <- apply(input.data, 2, function(x) gsub("Â", "", x))
input.data <- as.data.frame(input.data)
# get subsets
input.subset <- read.csv(subsets, stringsAsFactors = F)
# wrangle into list
u.samples <- unique(input.subset$sample)
input.barcode.strata <- list()
for (i in 1:length(u.samples)){
subset.strata <- input.subset[input.subset$sample %in% u.samples[i], ]
current.strata <- list()
for (j in 1:nrow(subset.strata)){
current.barcodes <- subset.strata$barcodes[j]
current.barcodes <- strsplit(current.barcodes, ", ")[[1]]
current.strata[[subset.strata$set[j]]] <- current.barcodes
}
input.barcode.strata[[u.samples[i]]] <- current.strata
}
return(list(
input.data = input.data,
input.barcode.strata = input.barcode.strata
))
}
input.list <- inputDataSpecification()
input.data <- input.list$input.data
input.barcode.strata <- input.list$input.barcode.strata
```
```{r c3 - additional specifications and assertions}
# Specify input data files
which.data <- parameter.list$which.data # REQUIRED
if ("which.strata" %in% names(parameter.list)){
which.strata <- parameter.list$which.strata # NA if no stratification
} else {
which.strata <- NA
}
# specify input
stopifnot(all(which.data %in% as.vector(input.data$sample)))
specified.input <- input.data[as.vector(input.data$sample) %in% which.data, ]
# get input data specifications
input_data <- as.vector(specified.input$files)
input_pcr_barcodes <- as.vector(specified.input$pcr.barcodes)
input_rt_barcodes <- as.vector(specified.input$rt.barcodes)
input_set <- c(input_data, input_pcr_barcodes, input_rt_barcodes)
input.type <- as.vector(specified.input$input.type)
all_input_organisms <- as.vector(specified.input$species)
dir <- as.vector(specified.input$directory)
# ensure correct path is used for data input
if (exists("data.path")){
dir <- paste0(data.path, dir)
}
# ensure regression variables are correctly specified
if (length(which.data) == 1){
parameter.list$vars2regress <- parameter.list$vars2regress[!(parameter.list$vars2regress %in% "dataset")]
}
# get stratification details
if (!is.na(which.strata)){
which.strata.specified <- which.strata
which.strata <- c()
for (i in 1:length(which.strata.specified)){
which.strata <- c(which.strata, input.barcode.strata[[which.data[i]]][[which.strata.specified[i]]])
}
}
# ensure correct directory is specified
stopifnot(exists("input.type"))
# ensure input data are specified
stopifnot(exists("input_set"))
try({
parameter.list$organism.include <- rep(parameter.list$organism.include, length(input_data))
}, silent = T)
# if set_names are not specified, set_names <- ""
if (exists("parameter.list$set_names")){
stopifnot(is.character(parameter.list$set_names))
} else {parameter.list$set_names <- ""}
# if pilot input are not specified, pilot_input <- ""
if (exists("pilot_input")){
stopifnot(length(pilot_input) == 1)
} else {pilot_input <- ""}
total_sets <- input_set
```
```{r c4 - analysis log}
df.log <- initiateLog("1, Preprocessing & QC")
if (input.type == 1){
df.log <- addLogEntry("Input Data (.Rdata)", input_data, df.log, "input_data")
df.log <- addLogEntry("Input RT Barcode (.txt)", input_rt_barcodes, df.log, "input_rt_barcodes")
df.log <- addLogEntry("Input PCR Barcode (.txt)", input_pcr_barcodes, df.log, "input_pcr_barcodes")
} else if (input.type == 2){
df.log <- addLogEntry("Input Cell Ranger", input_data, df.log, "input_data")
} else if (input.type == 3){
df.log <- addLogEntry("Input TPM Matrix", input_data, df.log, "input_data")
}
df.log <- addLogEntry("gene/cell Upper Limit (N/cell)", as.character(parameter.list$gene.upperlimit), df.log, "gene.upperlimit")
df.log <- addLogEntry("gene/cell Lower Limit (N/cell)", as.character(parameter.list$gene.lowerlimit), df.log, "gene.lowerlimit")
df.log <- addLogEntry("Filtered by unmatched reads",
as.character(parameter.list$unmatched.rate.filter.flag), df.log, "unmatched.rate.filter.flag")
df.log <- addLogEntry("Mitochondrial Content Upper Limit (%/cell)", as.character(parameter.list$mt.upperlimit), df.log, "mt.upperlimit")
if ("subsample_factor" %in% names(parameter.list)){
if (parameter.list$subsample_factor == 1){subsampled <- FALSE} else {subsampled <- TRUE}
} else {
parameter.list$subsample_factor <- 1
subsampled <- FALSE
}
df.log <- addLogEntry("Data Subsampled", subsampled, df.log, "subsampled")
df.log <- addLogEntry("Cluster Resolution", as.character(parameter.list$cluster.resolution), df.log, "cluster.resolution")
df.log <- addLogEntry("Filtered by organism", as.character(parameter.list$organism.filter.flag), df.log, "organism.filter.flag")
if (parameter.list$organism.filter.flag) {
df.log <- addLogEntry("Species Included", as.character(parameter.list$organism.include), df.log, "organism.include")
}
df.log <- addLogEntry("Number of Workers", as.character(n.workers$normScale), df.log, "normScale")
df.log <- addLogEntry("Max Memory (Gb)", as.character(max.memory), df.log, "max.memory")
if (!is.na(which.strata)) {
df.log <- addLogEntry("Barcode(s) Included", which.strata, df.log, "which.strata")
}
df.log <- addLogEntry("Variables regressed out", as.character(parameter.list$vars2regress), df.log, "vars2regress")
df.log <- addLogEntry("Input Species", all_input_organisms, df.log, "all_input_organisms")
df.log <- addLogEntry("Feature selection method", parameter.list$feature.select.method, df.log, "feature.select.method")
df.log <- addLogEntry("Scaling method", parameter.list$scale.method, df.log, "scale.method")
df.log <- addLogEntry("PCA component selection method", parameter.list$pca.component.select.method, df.log, "pca.component.select.method")
```
```{r}
loadMoffat.dev <- function(import_set, subsample_factor, input_organisms, organism_include, dir) {
# load gene count matrix
import_set_path <- paste(dir, import_set, sep ="")
load(import_set_path[1])
# assign row (genes) and column (cells) names
gene_count2 <- gene_count
# filter out incorrect species genes immediately.
all.genes <- rownames(gene_count2)
include.which.genes <- rep(FALSE, length(all.genes))
if ((length(input_organisms) > length(organism_include)) | (length(organism_include) == 1)){
for (i in 1:length(organism_include)){
if (organism_include[i] == "Mm"){
include.which.genes[grepl("MUSG", all.genes)] <- T
} else if (organism_include[i] == "Hs"){
include.which.genes[!(grepl("MUSG", all.genes))] <- T
}
}
df_gene <- df_gene[include.which.genes, ]
gene_count <- gene_count[include.which.genes, ]
gene_count2 <- gene_count2[include.which.genes, ]
}
# Infer organism
# orgs <- scMiko::inferSpecies(gene_count2, input_organisms)
orgs <- detectSpecies(gene_count2)
df_cell$orgs <- orgs
# only keep protein coding genes
# assign gene and cell names
rownames(gene_count2) <- df_gene$gene_id
colnames(gene_count2) <- df_cell$sample
# subsample data
if (subsample_factor < 1){
col_ind <- c(1:dim(gene_count2)[2])
subsample_ind <- sample(col_ind, round(subsample_factor * dim(gene_count2)[2]), replace = FALSE)
gene_count2 <- gene_count2[,subsample_ind]
} else {
subsample_ind <- c(1:dim(gene_count2)[2])
}
# Need names vectors to add additional metadata to Seurat object
gNames <- as.character( df_gene$gene_name );
names(gNames) <- as.character( df_gene$gene_id );
names(gNames) <-gsub("\\\\..*","",as.vector( names(gNames)))
# Add PCR barcodes
pcr.barcode.flag <- F
if (dir != import_set_path[2]){
try({
grps <- read.csv(import_set_path[2], header=T,sep="\\t",stringsAsFactors=F);
wells <- gsub("Han_[0-9]+_([A-Z0-9]{2,3}).[ACGT]+$","\\\\1",colnames(gene_count2));
myGrps <- grps$ConditionGroup[ match(wells,grps$sampleWell) ];
names(myGrps) <- colnames(gene_count2)
myGrps <- myGrps[subsample_ind]
pcr.barcode.flag <- T
}, silent = T)
}
# Add RT barcode
rtBC <- tryCatch({
rtBC <- read.csv(import_set_path[3],header=T,sep="\\t",stringsAsFactors=F);
}, error = function(e){
rtBC <- read.csv2(import_set_path[3],header=T,sep="\t",stringsAsFactors=F)
return(rtBC)
})
rtBC <- rtBC[subsample_ind, ]
if (is.null(rtBC$Plate)) rtBC$Plate <- ""
rtBC$plate.well.sample <- paste0("P", rtBC$Plate, ".", rtBC$Well, ".", rtBC$Sample.type)
barcodes <- gsub(".+([ACGT]{10}$)","\\\\1",colnames(gene_count2));
if (unique(barcodes) == "\\1") barcodes <- gsub(".+([ACGT]{10}$)","\\1",colnames(gene_count2));
idx <- match( barcodes,rtBC$rt.bc );
rtGrp <- rtBC$Sample.type[idx];
names(rtGrp) <- colnames(gene_count2);
rtPWS <- data.frame(
plate = rtBC$Plate[idx],
well = rtBC$Well[idx],
sample = rtBC$Sample.type[idx])
rtPWS.summary <- rtPWS %>% dplyr::group_by(plate, well, sample) %>% dplyr::tally()
# remove .* suffix in ensembl
rownames(gene_count2)<-gsub("\\\\..*","",as.vector( rownames(gene_count2)))
# Create seurat object with count matrix data
so <- CreateSeuratObject(counts=gene_count2,project="Hong-sciSeq3",min.cells=0,min.features=0,names.field=2,names.delim="." )
# Add gene symbols as meta data that we can use later
mat_ens <- rownames(so@assays[["RNA"]])
match.id <- match(mat_ens, names(gNames))
so[["RNA"]] <- AddMetaData( object=so[["RNA"]],metadata=names(gNames)[match.id],col.name="ENSEMBL");
so[["RNA"]] <- AddMetaData( object=so[["RNA"]],metadata=as.vector((gNames)[match.id]),col.name="SYMBOL");
# assign meta data
so.meta <- so@meta.data
so.meta$ind <- seq(1, nrow(so.meta))
so.meta$sample <- rownames(so.meta)
so.merge <- merge(so.meta, df_cell, by = "sample", all.x = T)
so.merge <- so.merge %>% dplyr::arrange(ind)
if ("unmatched_rate" %in% colnames(df_cell)) so[["unmatched.rate"]] <- so.merge$unmatched_rate
so[["Organism"]] <- so.merge$orgs
if (length(import_set) > 1){ # same as above, if barecode and well info. provided, assign to so structure
# Add group #s for mapping cells to PCR condition groups
if (pcr.barcode.flag) {
so[["group"]] <- myGrps;
} else {
so[["group"]] <- "unspecified";
}
# Add in cell line group info
so$Barcode <- rtGrp;
}
try({so@misc[["gene.info"]] <- df_gene}, silent = T)
try({so@misc[["cell.info"]] <- df_cell}, silent = T)
try({so@misc[["plate.summary"]] <- rtPWS.summary}, silent = T)
# return output
output <- list(so, gNames)
return(output)
}
```
```{r}
#
# function(import_set, input_organisms, dir = "") {
#
# import_set_path <- paste(dir, import_set, sep ="")
#
# # import cell ranger-processed data
# expression_matrix <- Seurat::Read10X(data.dir = import_set_path[1])
#
# # file.path(import_set_path[1], "barcodes.tsv")
# # list.files(path = import_set_path[1], pattern=NULL, all.files=FALSE,
# # full.names=FALSE)
#
# # assign to secondary matrix
# expression_matrix2 <- expression_matrix
#
# # import gene and ensembl names
# load.success <- F
# try({
# feature.names = read.delim(paste(import_set_path[1], "/genes.tsv", sep = ""),
# header = FALSE,
# stringsAsFactors = FALSE)
# load.success <- T
# }, silent = T)
# if (!load.success){
# feature.names = read.delim(paste(import_set_path[1], "/features.tsv", sep = ""),
# header = FALSE,
# stringsAsFactors = FALSE)
# }
#
#
# # remove hg19 tags
# feature.names[,1] <- gsub("hg19_", "", feature.names[,1])
# feature.names[,2] <- gsub("hg19_", "", feature.names[,2])
#
# # create gene list
# gNames <- as.character( feature.names$V2 );
# names(gNames) <- as.character( feature.names$V1 );
# names(gNames) <-gsub("\\..*","",as.vector( names(gNames)))
#
# # convert gene symbols to ensemble (in expression matrix)
# rownames(expression_matrix2) <- names(gNames)
#
# # assign species
# orgs <- detectSpecies(expression_matrix2)
# # orgs <- scMiko::m1.inferSpecies(expression_matrix2, input_organisms)
#
# # create seurat object
# # so = CreateSeuratObject(counts = expression_matrix2,project="cell_ranger",min.cells=3,min.features=200)
# so = CreateSeuratObject(counts = expression_matrix2,project="cell_ranger",min.cells=0,min.features=0)
#
# # add gene symbols as meta data in seurat object
# mat_ens <- rownames(so@assays[["RNA"]])
# match.id <- match(mat_ens, names(gNames))
# # gNames_filtered <- gNames[match.id]
#
# # Add gene symbols as meta data that we can use later
# # so[["RNA"]] <- AddMetaData( object=so[["RNA"]],metadata=gNames_filtered,col.name="gene_name");
# so[["RNA"]] <- AddMetaData( object=so[["RNA"]],metadata=names(gNames[match.id]),col.name="ENSEMBL");
# so[["RNA"]] <- AddMetaData( object=so[["RNA"]],metadata=as.vector(gNames[match.id]),col.name="SYMBOL");
#
# # Add in inferred organism
# so$Organism <- orgs;
#
# # Specify barcodes
# so$Barcode <- "unspecified";
#
# output <- list(so, gNames)
#
# return(output)
# }
#
```
```{r c6 - load_data}
# Run import function
so.list <- list()
gNames.list.list <- list()
# sequence indices
idx.seq <- c(1, 1+length(input_data), 1+(2*length(input_data)))
# assertions
stopifnot(length(dir) == length(input_data))
stopifnot(length(all_input_organisms) == length(input_data))
for (i in 1:length(input_data)){
cur.total_sets <- total_sets[idx.seq + (i-1)]
if (input.type[i] == 1) {
output.list <- loadMoffat(import_set = cur.total_sets,
subsample_factor = parameter.list$subsample_factor,
input_organisms = stringr::str_trim(unlist(strsplit(all_input_organisms[i], ","))) ,
organism_include = parameter.list$organism.include[i],
dir = dir[i])
} else if (input.type[i] == 2){
output.list <- loadCellRanger(import_set = cur.total_sets,
input_organisms = stringr::str_trim(unlist(strsplit(all_input_organisms[i], ","))),
dir = dir[i])
} else if (input.type[i] == 3){
output.list <- loadMat(import_set = cur.total_sets,
input_organisms = stringr::str_trim(unlist(strsplit(all_input_organisms[i], ","))),
dir = dir[i])
}
# unlist outputs
so.list[[i]] <- output.list[[1]]
gNames.list.list[[i]] <- output.list[[2]]
if ((unique(so.list[[i]]@meta.data[["Barcode"]]) == "unspecified") ){
so.list[[i]]@meta.data[["Barcode"]] <- paste("T", i, sep = "")
} else if (is.na(unique(so.list[[i]]@meta.data[["Barcode"]]))){
stop("Barcodes incorrectly specified") #140620
}
}
# assign set labels and merge seurat objects if multiple present
for (i in 1:length(so.list)){
if (length(which.data) == length(which.strata)){
set.label <- paste0(which.data[i],"_", which.strata[i])
} else {
set.label <- which.data[i]
}
so.list[[i]][["dataset"]] <- set.label
}
so <- mergeSeuratList(so.list)
rm(so.list);
rm(output.list);
invisible({gc()})
# Assumed that all gene lists are identical
gNames.list <- gNames.list.list[[1]]
```
```{r c8 - filter_data_by_subset, include = FALSE}
# define subset parameters
if ("organism.include" %in% names(parameter.list)){
if (sum(c("Mm", "Hs") %in% parameter.list$organism.include) > 0){
subset.df <- data.frame(field = "Organism", subgroups = parameter.list$organism.include)
}
# subset data
so <- subsetSeurat(so, subset.df)
}
```
```{r visualize plate layout, fig.width=8, fig.height=6}
visualize.plate <- F
if (visualize.plate){
so.meta <- so@meta.data
myLetters <- toupper(letters[1:26])
so.meta$well <- stringr::str_extract(rownames(so.meta), "[A-Z0-9]*")
so.meta$row <- match(stringr::str_extract(so.meta$well, "[A-Z]*"), myLetters)
so.meta$col <- as.numeric(stringr::str_remove(so.meta$well, "[A-Z]*"))
so.meta$bc <- gsub("\\.", "", stringr::str_remove(rownames(so.meta), "[A-Z0-9_]*"))
# import 384 ligation bc
df.384lig <- readxl::read_xlsx("C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/ReportFolder/sciRNAseq3_RunHH14_210729/siRNAseq3_Ligation_barcodes_090821.xlsx")
# so.meta <- so.meta %>% dplyr::filter(grepl("L1_|L2_|L3_", so.meta$Barcode))
so.meta <- so.meta %>% dplyr::filter(grepl("08_", so.meta$Barcode))
table(so.meta$Barcode)
# dat <- so.meta
# row.col = "row"; col.col = "col"; value.col = "nCount_RNA"
asPlate <- function(dat, row.col = "row", col.col = "col", value.col = "nCount_RNA", aggregate.function = "sum"){
if (value.col == "n_nuclei"){
# dat <- dat %>% dplyr::select(c(row.col, col.col, value.col))
dat$col.val <- dat[ ,col.col]
dat$row.val <- dat[ ,row.col]
u.row <- unique(dat [,row.col]); u.row <- u.row[order(u.row)]
u.col <- unique(dat [,col.col]); u.col <- u.col[order(u.col)]
dat <- dat %>%
dplyr::group_by(col.val, row.val) %>%
dplyr::summarise(value = length(col.val))
} else {
dat <- dat %>% dplyr::select(c(row.col, col.col, value.col))
dat$col.val <- dat[ ,col.col]
dat$row.val <- dat[ ,row.col]
u.row <- unique(dat [,row.col]); u.row <- u.row[order(u.row)]
u.col <- unique(dat [,col.col]); u.col <- u.col[order(u.col)]
dat$value <- dat[ ,value.col]
}
agg.func <- sum
if (aggregate.function == "sum"){
agg.func <- sum
} else if (aggregate.function == "mean"){
agg.func <- mean
} else if (aggregate.function == "median"){
agg.func <- median
}
# if (value.col)
dat.sum <- dat %>%
dplyr::group_by(row.val, col.val) %>%
dplyr::summarise(x= agg.func(value, na.rm = T))
dat.sum$row.val <- factor(dat.sum$row.val, levels = u.row)
dat.sum$col.val <- factor(dat.sum$col.val, levels = u.col)
dat.sum %>%
ggplot(aes(y = (row.val), x = (col.val), fill = x)) +
geom_tile() +
labs(title = paste0(value.col, "/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = value.col,
caption = paste0("Aggregation Function: ", aggregate.function)) +
viridis::scale_fill_viridis()
}
plt.plate.umi <-asPlate(dat = so.meta, row.col = "row", col.col = "col", value.col = "nCount_RNA", aggregate.function = "mean")
plt.plate.gene <-asPlate(dat = so.meta, row.col = "row", col.col = "col", value.col = "nFeature_RNA", aggregate.function = "mean")
plt.plate.ur <-asPlate(dat = so.meta, row.col = "row", col.col = "col", value.col = "unmatched.rate", aggregate.function = "mean")
plt.plate.nuclei <-asPlate(dat = so.meta, row.col = "row", col.col = "col", value.col = "n_nuclei")
plt.plate <- cowplot::plot_grid(plt.plate.umi, plt.plate.gene, plt.plate.ur, plt.plate.nuclei, align = "hv")
plt.plate
table(so.meta$Barcode)
}
# savePDF("p14_08Mix_hNP_plateStatistics_visualized_090821.pdf", plt.plate, fig.width=8, fig.height=6)
```
```{r visualize RT plate layout, fig.width=16, fig.height=6}
visualize.plate <- F
if (visualize.plate){
so.meta <- so@meta.data
myLetters <- toupper(letters[1:26])
so.meta$well <- stringr::str_extract(rownames(so.meta), "[A-Z0-9]*")
so.meta$row <- match(stringr::str_extract(so.meta$well, "[A-Z]*"), myLetters)
so.meta$col <- as.numeric(stringr::str_remove(so.meta$well, "[A-Z]*"))
so.meta$bc <- gsub("\\.", "", stringr::str_remove(rownames(so.meta), "[A-Z0-9_]*"))
# Add RT barcode
df.384lig <- tryCatch({
rtBC <- read.csv("C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/ReportFolder/sciRNAseq3_RunHH14_210729/Run14_rt_barcodes_20210730NewFormat_288used.txt",header=T,sep="\\t",stringsAsFactors=F);
}, error = function(e){
df.384lig <- read.csv2("C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/ReportFolder/sciRNAseq3_RunHH14_210729/Run14_rt_barcodes_20210730NewFormat_288used.txt",header=T,sep="\t",stringsAsFactors=F)
return(df.384lig)
})
so.meta <- so.meta %>% dplyr::filter(grepl("08_", so.meta$Barcode))
u.plates <- unique(df.384lig$Plate)
u.well <- unique(df.384lig$Well)
for (i in 1:length(u.plates)){
plate.name <- u.plates[i]
for (j in 1:length(u.well)){
well.name <- u.well[j]
which.ind <- which((df.384lig$Plate == plate.name) & (df.384lig$Well == well.name))
bc.name <- df.384lig$rt.bc[which.ind]
which.match <- which(grepl(bc.name, so.meta$bc))
df.384lig$n.nuclei[which.ind] <- length(which.match)
df.384lig$nCount[which.ind] <- mean(so.meta$nCount_RNA[which.match], na.rm = T)
df.384lig$nFeature[which.ind] <- mean(so.meta$nFeature_RNA[which.match], na.rm = T)
df.384lig$unmatch_rate[which.ind] <- mean(so.meta$unmatched.rate[which.match], na.rm = T)
}
}
df.384lig$row <- match(stringr::str_extract(df.384lig$Well, "[A-Z]*"), myLetters)
df.384lig$col <- as.numeric(stringr::str_remove(df.384lig$Well, "[A-Z]*"))
u.row <- unique(df.384lig$row); u.row <- u.row[order(u.row)]
u.col <- unique(df.384lig$col); u.col <- u.col[order(u.col)]
df.384lig$row <- factor(df.384lig$row, levels = u.row)
df.384lig$col <- factor(df.384lig$col, levels = u.col)
plt.plate.nuclei <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = n.nuclei)) +
geom_tile() +
labs(title = paste0("n.nuclei/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "n.nuclei") +
viridis::scale_fill_viridis() +
facet_wrap(~Plate)
plt.plate.umi <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = nCount)) +
geom_tile() +
labs(title = paste0("nCount_RNA/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "nCount_RNA") +
viridis::scale_fill_viridis() +
facet_wrap(~Plate)
plt.plate.gene <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = nFeature)) +
geom_tile() +
labs(title = paste0("nFeature_RNA/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "nFeature_RNA") +
viridis::scale_fill_viridis() +
facet_wrap(~Plate)
plt.plate.ur <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = unmatch_rate)) +
geom_tile() +
labs(title = paste0("unmatched.rate/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "unmatched.rate") +
viridis::scale_fill_viridis() +
facet_wrap(~Plate)
plt.plate <- cowplot::plot_grid(plt.plate.umi, plt.plate.gene, plt.plate.ur, plt.plate.nuclei, align = "hv")
plt.plate
# table(so.meta$Barcode)
}
# savePDF("p14_08Mix_hNP_plateStatistics_visualized_090821.pdf", plt.plate, fig.width=8, fig.height=6)
# savePDF("p14_08Mix_RTPlate_visualized_090821.pdf", plt.plate, fig.width=16, fig.height=6)
```
```{r visualize ligation plate layout, fig.width=16, fig.height=12}
visualize.plate <- F
if (visualize.plate){
so.meta <- so@meta.data
myLetters <- toupper(letters[1:26])
so.meta$well <- stringr::str_extract(rownames(so.meta), "[A-Z0-9]*")
so.meta$row <- match(stringr::str_extract(so.meta$well, "[A-Z]*"), myLetters)
so.meta$col <- as.numeric(stringr::str_remove(so.meta$well, "[A-Z]*"))
so.meta$bc <- gsub("\\.", "", stringr::str_remove(rownames(so.meta), "[A-Z0-9_]*"))
so.meta <- so.meta %>% dplyr::filter(grepl("37_|38_", so.meta$Barcode))
# import 384 ligation bc
df.384lig <- readxl::read_xlsx("C:/Users/Owner/Dropbox/PDF Projects - JM/Data/scRNA-seq/01_sci-RNA-seq3_Hong_Kevin_Jason/ReportFolder/sciRNAseq3_RunHH14_210729/siRNAseq3_Ligation_barcodes_090821.xlsx")
u.plates <- unique(df.384lig$`Plate Name`)
u.well <- unique(df.384lig$`Well Position`)
for (i in 1:length(u.plates)){
plate.name <- u.plates[i]
for (j in 1:length(u.well)){
well.name <- u.well[j]
which.ind <- which((df.384lig$`Plate Name` == plate.name) & (df.384lig$`Well Position` == well.name))
bc.name <- df.384lig$LigBarcodes...10[which.ind]
which.match <- which(grepl(bc.name, so.meta$bc))
df.384lig$n.nuclei[which.ind] <- length(which.match)
df.384lig$nCount[which.ind] <- mean(so.meta$nCount_RNA[which.match], na.rm = T)
df.384lig$nFeature[which.ind] <- mean(so.meta$nFeature_RNA[which.match], na.rm = T)
df.384lig$unmatch_rate[which.ind] <- mean(so.meta$unmatched.rate[which.match], na.rm = T)
}
}
df.384lig$row <- match(stringr::str_extract(df.384lig$`Well Position`, "[A-Z]*"), myLetters)
df.384lig$col <- as.numeric(stringr::str_remove(df.384lig$`Well Position`, "[A-Z]*"))
u.row <- unique(df.384lig$row); u.row <- u.row[order(u.row)]
u.col <- unique(df.384lig$col); u.col <- u.col[order(u.col)]
df.384lig$row <- factor(df.384lig$row, levels = u.row)
df.384lig$col <- factor(df.384lig$col, levels = u.col)
plt.plate.nuclei <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = n.nuclei)) +
geom_tile() +
labs(title = paste0("n.nuclei/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "n.nuclei") +
viridis::scale_fill_viridis() +
facet_wrap(~`Plate Name`)
plt.plate.umi <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = nCount)) +
geom_tile() +
labs(title = paste0("nCount_RNA/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "nCount_RNA") +
viridis::scale_fill_viridis() +
facet_wrap(~`Plate Name`)
plt.plate.gene <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = nFeature)) +
geom_tile() +
labs(title = paste0("nFeature_RNA/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "nFeature_RNA") +
viridis::scale_fill_viridis() +
facet_wrap(~`Plate Name`)
plt.plate.ur <- df.384lig %>%
ggplot(aes(y = (row), x = (col), fill = unmatch_rate)) +
geom_tile() +
labs(title = paste0("unmatched.rate/well"), subtitle = "96 well plate", x = "columns (1-12)", y = "rows (A-H)", fill = "unmatched.rate") +
viridis::scale_fill_viridis() +
facet_wrap(~`Plate Name`)
plt.plate <- cowplot::plot_grid(plt.plate.umi, plt.plate.gene, plt.plate.ur, plt.plate.nuclei, align = "hv")
plt.plate
# table(so.meta$Barcode)
}
# savePDF("p14_08Mix_hNP_plateStatistics_visualized_090821.pdf", plt.plate, fig.width=8, fig.height=6)
# savePDF("p14_PR_GBM_ligationPlate_visualized_090821.pdf", plt.plate, fig.width=16, fig.height=12)
```
```{r c9 - subset barcodes, include = FALSE}
so <- barcodeLabels(so, which.strata)
```
```{r c10 - get_mt_content, include = FALSE}
# MITOCHONDRIAL CONTENT
so <- getMitoContent(so, gNames.list)
```
```{r c11 - barcode rankings, fig.width= 15, fig.height=5, include = FALSE}
# get metadata
df.meta.qc <- so@meta.data
# rank QC metrics
df.meta.qc$rank.umi <- rank(-df.meta.qc$nCount_RNA)
df.meta.qc$rank.gene <- rank(-df.meta.qc$nFeature_RNA)
df.meta.qc$rank.mt <- rank(-df.meta.qc$percent.mt)
df.meta.qc$mitoOmit <- df.meta.qc$percent.mt > parameter.list$mt.upperlimit
df.meta.qc$mitoColor <- "black"
df.meta.qc$mitoColor[df.meta.qc$mitoOmit] <- "tomato"
plt.gene.umi.mito <- df.meta.qc %>%
ggplot(aes(x = nCount_RNA, y = (nFeature_RNA))) +
geom_point(color = df.meta.qc$mitoColor) +
scale_y_continuous(trans='log10') +
scale_x_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$gene.upperlimit , color = "red", linetype = "dashed") +
geom_hline(yintercept = parameter.list$gene.lowerlimit, color = "red", linetype = "dashed") +
xlab("Transcript counts (n/cell)") + ylab("Gene counts (n/cell)") +
theme_classic() + labs(title = "Transcript vs Gene Counts", caption = "dashed red line: gene count thresholds\nred points: high mt content.")
# generate ranking plots
plt.umi.rank.mito <- df.meta.qc %>%
ggplot(aes(x = rank.umi, y = (nCount_RNA))) +
geom_point(color = df.meta.qc$mitoColor) +
scale_y_continuous(trans='log10') +
xlab("Cell Rank") + ylab("Transcript counts (n/cell)") +
theme_classic() + labs(title = "Transcript count ranking", caption = "red points: high mt content.")
plt.gene.rank.mito <- df.meta.qc %>%
ggplot(aes(x = rank.gene, y = (nFeature_RNA))) +
geom_point(color = df.meta.qc$mitoColor) + scale_y_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$gene.upperlimit , color = "red", linetype = "dashed") +
geom_hline(yintercept = parameter.list$gene.lowerlimit, color = "red", linetype = "dashed") +
xlab("Cell Rank") + ylab("Gene counts (n/cell)") +
theme_classic() + labs(title = "Gene count ranking", caption = "dashed red line: gene count thresholds\nred points: high mt content.")
plt.mt.rank.mito <- df.meta.qc %>%
ggplot(aes(x = rank.mt, y = (percent.mt))) +
geom_point(color = df.meta.qc$mitoColor) + scale_y_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$mt.upperlimit, color = "red", linetype = "dashed") +
xlab("Cell Rank") + ylab("Mitochondrial Content (%/cell)") +
theme_classic() + labs(title = "Mitochondrial content ranking", caption = "red: high mt content.")
if (input.type == 1){
unmatch.median <- median(df.meta.qc$unmatched.rate)
unmatch.mad <- mad(df.meta.qc$unmatched.rate, unmatch.median)
if (parameter.list$unmatched.rate.filter.flag){
unmatch.high <- unmatch.median + (1.96*unmatch.mad)
unmatch.low <- unmatch.median - (1.96*unmatch.mad)
} else {
unmatch.high <- 1
unmatch.low <- 0
}
df.meta.qc$matchOmit <- (df.meta.qc$unmatched.rate > unmatch.high) | (df.meta.qc$unmatched.rate < unmatch.low)
df.meta.qc$matchColor <- "black"
df.meta.qc$matchColor[df.meta.qc$matchOmit] <- "skyblue"
plt.unmatch.umi <- df.meta.qc %>%
ggplot(aes(x = nCount_RNA, y = (unmatched.rate))) +
geom_point(color = df.meta.qc$matchColor) +
xlab("Transcript Count") + ylab("Unmatched Rate") +
geom_hline(yintercept = unmatch.high, color = "red", linetype = "dashed") +
geom_hline(yintercept = unmatch.low, color = "red", linetype = "dashed") +
theme_classic() + labs(title = "Transcript count vs. unmatched rate", caption = "blue: unmatched rate outliers")
df.meta.qc$unmatch.rank <- rank(-df.meta.qc$unmatched.rate)
plt.umatch.rank <- df.meta.qc %>%
ggplot(aes(x = unmatch.rank, y = (unmatched.rate))) +
geom_point(color = df.meta.qc$matchColor) +
scale_y_continuous(trans='log10') +
geom_hline(yintercept = unmatch.high, color = "red", linetype = "dashed") +
geom_hline(yintercept = unmatch.low, color = "red", linetype = "dashed") +
xlab("Cell Rank") + ylab("Unmatched Rate") +
theme_classic() + labs(title = "Unmatched rate ranking", caption = "blue: unmatched rate outliers")
plt.gene.umi.match <- df.meta.qc %>%
ggplot(aes(x = nCount_RNA, y = (nFeature_RNA))) +
geom_point(color = df.meta.qc$matchColor) +
scale_y_continuous(trans='log10') +
scale_x_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$gene.upperlimit , color = "red", linetype = "dashed") +
geom_hline(yintercept = parameter.list$gene.lowerlimit, color = "red", linetype = "dashed") +
xlab("Transcript counts (n/cell)") + ylab("Gene counts (n/cell)") +
theme_classic() + labs(title = "Transcript vs Gene Counts", caption ="blue: unmatched rate outliers")
# generate ranking plots
plt.umi.rank.match <- df.meta.qc %>%
ggplot(aes(x = rank.umi, y = (nCount_RNA))) +
geom_point(color = df.meta.qc$matchColor) +
scale_y_continuous(trans='log10') +
xlab("Cell Rank") + ylab("Transcript counts (n/cell)") +
theme_classic() + labs(title = "Transcript count ranking", caption = "blue: unmatched rate outliers")
plt.gene.rank.match <- df.meta.qc %>%
ggplot(aes(x = rank.gene, y = (nFeature_RNA))) +
geom_point(color = df.meta.qc$matchColor) + scale_y_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$gene.upperlimit , color = "red", linetype = "dashed") +
geom_hline(yintercept = parameter.list$gene.lowerlimit, color = "red", linetype = "dashed") +
xlab("Cell Rank") + ylab("Gene counts (n/cell)") +
theme_classic() + labs(title = "Gene count ranking", caption = "blue: unmatched rate outliers")
plt.mt.rank.match <- df.meta.qc %>%
ggplot(aes(x = rank.mt, y = (percent.mt))) +
geom_point(color = df.meta.qc$matchColor) + scale_y_continuous(trans='log10') +
geom_hline(yintercept = parameter.list$mt.upperlimit, color = "red", linetype = "dashed") +
xlab("Cell Rank") + ylab("Mitochondrial Content (%/cell)") +
theme_classic() + labs(title = "Mitochondrial content ranking", caption = "blue: unmatched rate outliers")
} else {
plt.umatch.rank <- NULL
plt.unmatch.umi <- NULL
unmatch.high <- 1
unmatch.low <- 0
}
if (parameter.list$print.inline) {
cowplot::plot_grid(plt.gene.umi.mito, plt.umi.rank.mito, plt.gene.rank.mito, plt.mt.rank.mito, ncol = 4, align = "h")
if (input.type == 1){
ggpubr::ggarrange(plt.umi.rank.match, plt.gene.rank.match, plt.mt.rank.match,
plt.unmatch.umi, plt.gene.umi.match, plt.umatch.rank,ncol = 3, nrow = 2)
}
}
```
```{r c12 - Density plot of % mitochondrial genes, warning = FALSE, include = FALSE}
# MITOCHONDRIAL CONTENT DISTRIBUTIONS
calc_mito_content_by_bc <- function(so, set_name, bc_info_flag, input.type = 1){
if (bc_info_flag == TRUE & input.type != 2){
# Compute mitochrondrial content for each cell type
# get unique cell types
barcodes <-unique(so@meta.data[["Barcode"]])
# create dataframe to store mitochondrial content, stratified by cell type
df_by_bc <- NULL
# for each cell type
for (i in 1:length(barcodes)) {
percent.mt_by_bc <- as.vector(so$percent.mt[so$Barcode == barcodes[i]])
count.feature_by_bc <- as.vector(so$nFeature_RNA[so$Barcode == barcodes[i]])
count.rna_by_bc <- as.vector(so$nCount_RNA[so$Barcode == barcodes[i]])
n_cells <- length(percent.mt_by_bc)
cur_length <- dim(df_by_bc)[1]
# get QC parameters for each cell sample
df_by_bc <- bind_rows(
df_by_bc,
data.frame(barcode = barcodes[i],
percent.mt = percent.mt_by_bc,
count.feature = count.feature_by_bc,
count.rna = count.rna_by_bc)
)
}
} else { # if no cell type information provided, pool everything together
df_by_bc <- data.frame()
percent.mt_by_bc <- as.vector(so$percent.mt)
count.feature_by_bc <- as.vector(so$nFeature_RNA)
count.rna_by_bc <- as.vector(so$nCount_RNA)
df_by_bc <- data.frame(percent.mt_by_bc, count.feature_by_bc, count.rna_by_bc)
colnames(df_by_bc) <- c("percent.mt", "count.feature", "count.rna")
df_by_bc["barcode"] <-"unspecified"
barcodes <- "unspecified"
so@meta.data[["Barcode"]] <- "unspecified"
}
# Create summary graphs
# plot density plot
plt.handle <- ggplot(df_by_bc, aes(percent.mt, color=barcode)) +
geom_density(size = 1) +
ggtitle(paste(set_name)) +
xlab("Mitochrondrial Content (%)")
output <- list(so, plt.handle)
return(output)
}
# calculate mitochondrial content, stratified by barcode (if available)
output_by_bc <- calc_mito_content_by_bc(so, parameter.list$set_names, (length(total_sets)>1), input.type)
# retrive results
so <- output_by_bc[[1]]
plt.mito_by_bc <- output_by_bc[[2]]
# show mito filter threhsold
if ("mt.upperlimit" %in% names(parameter.list)){
if (parameter.list$mt.upperlimit >=0 & parameter.list$mt.upperlimit <= 100){
plt.mito_by_bc <- plt.mito_by_bc + geom_vline(xintercept = parameter.list$mt.upperlimit, linetype = "dashed")
}
}
if (parameter.list$print.inline){
print(plt.mito_by_bc)
}
```
```{r c13 - qc_violin_plots, fig.height= 5, fig.width = 12, include = FALSE}
if (length(unique(so@meta.data[["subset_group"]]))> 1){
df.qc <- data.frame(so@meta.data[["subset_group"]],
so@meta.data[["nFeature_RNA"]],
so@meta.data[["nCount_RNA"]],
so@meta.data[["percent.mt"]])
colnames(df.qc) <- c("group", "nFeature_RNA", "nCount_RNA", "percent.mt")
df.qc_sum <- df.qc %>%
group_by(group) %>%
summarize(nFeature_mean = round(mean(nFeature_RNA),2),
nCount_mean = round(mean(nCount_RNA),2),
percent.mt_mean = round(mean(percent.mt),2),
nFeature_median = round(median(nFeature_RNA),2),
nCount_median = round(median(nCount_RNA),2),
percent.mt_median = round(median(percent.mt),2))
nFeat_aov <- summary(aov(nFeature_RNA ~ group, data = df.qc))
nCount_aov <- summary(aov(nCount_RNA ~ group, data = df.qc))
mT_aov <- summary(aov(percent.mt ~ group, data = df.qc))
} else {
df.qc <- data.frame()
df.qc_sum <- data.frame()
nFeat_aov <- c()
nCount_aov <- c()
mT_aov <- c()
}
violin.list <- QC.violinPlot(so, plt.log.flag = T)
plt.QC_violin <- violin.list[[1]]
plt.violin_by_subgroup <- violin.list[[2]]
if (parameter.list$print.inline){
print(plt.QC_violin)
print(plt.violin_by_subgroup)
}
```
```{r c14 - QC scatterplots, fig.height= 5, fig.width = 12, include = FALSE}
# Generate QC scatterplots
max.cells <- 40000
if (ncol(so) > max.cells) {
plt.QC_scatter <- QC.scatterPlot(downsampleSeurat(so, subsample.n = max.cells))
} else {
plt.QC_scatter <- QC.scatterPlot(so)
}
if (parameter.list$print.inline) {
print(plt.QC_scatter)
}
```
```{r c15 - filter_QC, fig.width = 5, fig.height = 5, include = FALSE}
if (length(parameter.list$set_names) == 2) {
set.n <- NULL
} else {
set.n <- parameter.list$set_names
}
filter_output_list <- filterSeurat(so = so,
RNA.upperlimit = parameter.list$gene.upperlimit,
RNA.lowerlimit = parameter.list$gene.lowerlimit,
mt.upperlimit = parameter.list$mt.upperlimit,
unmatch.low = unmatch.low,
unmatch.high = unmatch.high,
set_names = set.n)
so <- filter_output_list[["seurat"]]
plt.filter_pre_post <- filter_output_list[["plot"]]
filter.breakdown.list <- filter_output_list[["filter.breakdown"]]
rm(filter_output_list)
plt.filter.venn <- NULL
try({
plt.filter.venn <- ggVennDiagram::ggVennDiagram(filter.breakdown.list)
# plt.filter.venn <- gplots::venn(filter.breakdown.list, show.plot = F, col = gplots::rich.colors(palette="temperature", 3))
}, silent = T)
if (parameter.list$print.inline){
print(plt.filter.venn)
print(plt.filter_pre_post)
}
```
```{r c16 - normalize_data, warning = FALSE, include = FALSE}
# Run data normalization
normalization_method_ps <- "SCT" # "NFS" or "SCT"
enable.parallelization <- F
if ("conserve.memory" %in% names(parameter.list) ){
if (is.logical(parameter.list$conserve.memory)){
conserve.memory <- parameter.list$conserve.memory
} else {
conserve.memory <- F
}
} else {
conserve.memory <- F
}
if (n.workers$normScale == 1) enable.parallelization <- F
parameter.list$clip.range <- c(-sqrt(x = ncol(x = so[["RNA"]])/30), sqrt(x = ncol(x =
so[["RNA"]])/30))
# if (parameter.list$clip.range > 10){
# parameter.list$clip.range <- 10
# }
so<- scNormScale(so, gNames.list, normalization_method_ps, vars2regress = parameter.list$vars2regress,
n.workers = n.workers$normScale, max.memory = (max.memory*20480/20 * 1024^2),
enable.parallelization = enable.parallelization, conserve.memory = conserve.memory,
clip.range = parameter.list$clip.range)
# plot variable genes
var.gene.orig <- so@assays[["SCT"]]@var.features
plt.variable_genes <- variableGenes.Plot(so, gNames.list, parameter.list$set_names)
plt.variable_genes <- plt.variable_genes +
labs(title = "Variably-Expressed Genes") +
theme_miko(legend = T) +
theme(legend.position = "bottom")
if (parameter.list$print.inline) {
print(plt.variable_genes)
}
```
```{r c17 - per cell statistics, include = FALSE}
# ptm <- proc.time()
df_nCount <- data.frame(so@meta.data[["nCount_RNA"]], so@meta.data[["nCount_SCT"]])
colnames(df_nCount) <- c("UMI/cell (raw)", "UMI/cell (normalized)")
df_nCount_m <- melt(df_nCount)
df_nFeat <- data.frame(so@meta.data[["nFeature_RNA"]], so@meta.data[["nFeature_SCT"]])
colnames(df_nFeat) <- c("genes/cell (raw)","genes/cell (normalized)")
df_nFeat_m <- melt(df_nFeat)
plt.UMI_per_cell <- ggplot(df_nCount_m, aes(x = value, fill = variable)) +
geom_density(alpha = 0.3) + scale_x_log10()+ xlab("UMI/cell")
plt.genes_per_cell <- ggplot(df_nFeat_m, aes(x = value, fill = variable)) +
geom_density(alpha = 0.3) + scale_x_log10()+ xlab("genes/cell")
plt.pre_post_1 <- FeatureScatter(so, feature1 = "nCount_RNA", feature2 = "nCount_SCT") +
xlab("UMI/cell (raw)") + ylab("UMI/cell (normalized)")
plt.pre_post_2 <- FeatureScatter(so, feature1 = "nFeature_RNA", feature2 = "nFeature_SCT") +
xlab("Genes/cell (raw)") + ylab("Genes/cell (normalized)")
plt.pre_post_norm <- CombinePlots(plots = list(plt.pre_post_1, plt.pre_post_2), ncol = 2, legend = 'none')
if (parameter.list$print.inline){
print(plt.UMI_per_cell)
print(plt.genes_per_cell)
print(plt.pre_post_norm)
}
```
```{r deviance feature selection, include = FALSE}
# feature selection using deviance #############################################
# get deviant features using subset of data ####################################
if (parameter.list$feature.select.method == "deviance"){
miko_message("Performing feature selection using deviance scores...")
DefaultAssay(so) <- "SCT"
so.sub <- downsampleSeurat(object = so, subsample.n = 20000)
m <- GetAssayData(so.sub, slot = "data", assay = "SCT")
m <- m[rownames(m) %in% rownames(so.sub), ]
m <- as.matrix(m)
batch.factor <- as.factor(so.sub@meta.data$Barcode)
# system.time({
devs <- scry::devianceFeatureSelection(object = m, batch = batch.factor, fam = "binomial")
# })
df.dev <- data.frame(
gene = rownames(m),
d = devs,
is.var = rownames(m) %in% var.gene.orig
) %>% dplyr::arrange(-d)
df.dev$d.norm <- df.dev$d/sum(df.dev$d)
df.dev$d.cumsum <- cumsum(df.dev$d.norm )
dev.gene <- df.dev$gene[df.dev$d.cumsum < 0.8]
if (length(dev.gene) > 2000){
dev.gene <- df.dev$gene[1:2000]
}
dev.gene <- dev.gene[!grepl("MT", toupper(gNames.list[dev.gene]))]
so@assays[["SCT"]]@var.features <- dev.gene
} else if (parameter.list$feature.select.method == "hvg"){
dev.gene <- var.gene.orig[!grepl("MT", toupper(gNames.list[var.gene.orig]))]
so@assays[["SCT"]]@var.features <- dev.gene
}
```
```{r null model residuals, include = FALSE}
if (parameter.list$scale.method == "null_residuals"){
m <- GetAssayData(so, slot = "data", assay = "SCT")
m <- m[rownames(m) %in% dev.gene, ]
so[["DEV"]] <- CreateAssayObject(counts = m)
m <- as.matrix(m)
batch.factor <- as.factor(so@meta.data$Barcode)
# system.time({
m.deviance <- scry::nullResiduals(
object = m,
fam = c("binomial"),
type = c("deviance"),
batch = batch.factor
)
# })
m.deviance[is.na(m.deviance)] <- 0
rownames(m.deviance) <- rownames(m);
colnames(m.deviance) <- colnames(m)
invisible({gc()})
m.deviance[m.deviance < parameter.list$clip.range[1]] <- parameter.list$clip.range[1]
m.deviance[m.deviance > parameter.list$clip.range[2]] <- parameter.list$clip.range[2]
# parameter.list$clip.range
so@assays[["DEV"]]@scale.data <- m.deviance
so@assays[["DEV"]]@var.features <- dev.gene
DefaultAssay(so) <- "DEV"
rm(so.sub)
rm(m);
rm(m.deviance);
invisible({gc() })
}
```
```{r identify artefact genes, include = FALSE}
u.bc <- unique(so@meta.data$Barcode)
if (length(u.bc) > 2){
# var.gene <- var.gene.orig
var.gene <- so@assays[[DefaultAssay(so)]]@var.features
miko_message("Identifying artifact genes...")
ag.res <- findArtifactGenes(object = so, assay = "RNA", features = var.gene,
meta.feature = "Barcode", umi.count.threshold = 5, difference.threshold = 30, verbose = T)
plt.ag <- ag.res[["plot"]]
try({
gene.rep <- checkGeneRep(reference.genes = gNames.list, query.genes = plt.ag[["data"]][["gene"]])
if (gene.rep == "ensembl"){
plt.ag[["data"]][["gene"]] <- as.character(gNames.list[plt.ag[["data"]][["gene"]]])
plt.ag[["layers"]][[2]][["data"]][["gene"]] <- as.character(gNames.list[plt.ag[["layers"]][[2]][["data"]][["gene"]]])
}
}, silent = T)
if (parameter.list$correct.artifact){
miko_message(paste0(length(ag.res$artifact.gene), " artifact genes omitted from PCA..."))
var.gene.update <- var.gene[!c(var.gene %in% ag.res$artifact.gene)]
artifact.gene <- ag.res$artifact.gene
# var.gene.update <- var.gene[!c(var.gene %in% artifact.gene)]
# expr.cell <- getExpressedGenes(object = so, min.pct = 0.1, group = NA, group.boolean = "OR")
# var.gene.update <- intersect(var.gene.update, expr.cell)
# var.gene.update <- var.gene.update[!(var.gene.update %in% artifact.gene)]
var.gene.update <- var.gene.update[!grepl("MT", toupper(gNames.list[var.gene.update]))]
so@assays[[DefaultAssay(so)]]@var.features <- var.gene.update
}
if (parameter.list$print.inline) print(plt.ag)
} else {
var.gene <- so@assays[[DefaultAssay(so)]]@var.features
var.gene.update <- var.gene[!grepl("MT", toupper(gNames.list[var.gene]))]
}
if (parameter.list$scale.method == "sct"){
so <- GetResidual(so, features = so@assays[[DefaultAssay(so)]]@var.features, verbose = F)
}
# dev.gene <- dev.gene[!(dev.gene %in% artifact.gene)]
# DefaultAssay(so) <- "SCT"
# plt.ag
```
```{r c18 - preliminary umap}
# scale.byscale.by.mad
# if (parameter.list$scale.method == "sct"){
n.dim.all <- 50
# } else if (parameter.list$scale.method == "null_residuals"){
# n.dim.all <- 50
# }
# Run pca
if (parameter.list$pca.method == "pca"){
miko_message("Running PCA...")
so <- RunPCA(so, verbose = FALSE, features = so@assays[[DefaultAssay(so)]]@var.features,
npcs = n.dim.all)
} else if (parameter.list$pca.method == "glmpca"){
miko_message("Running GLM-PCA...")
so <- SeuratWrappers::RunGLMPCA(
object = so,
L = n.dim.all,
assay = NULL,
features = so@assays[[DefaultAssay(so)]]@var.features,
reduction.name = "pca",
reduction.key = "PC_",
verbose = F,
minibatch = "stochastic")
}
# specify number of PCA components to use for downstream analysis
pca.var.threshold <- parameter.list$pca.var.threshold
pca.prop <- propVarPCA(so)
# mat <- Seurat::GetAssayData(so, assay = DefaultAssay(so), slot = "scale.data")
# mat <- mat[rownames(mat) %in% so@assays[[DefaultAssay(so)]]@var.features, ]
# pca <- so[["pca"]]
# # Get the total variance:
# total_variance <- sum(matrixStats::rowVars(mat))
# eigValues = (pca@stdev)^2 ## EigenValues
# varExplained = eigValues / total_variance
# pca.prop$ve <- varExplained
# pca.prop$cve <- cumsum(pca.prop$ve)
# rm(mat); rm(pca);
# invisible({gc()})
# pca.threshold.method <- "elbow"
pca.elbow.low <- 0.015
if (parameter.list$pca.component.select.method == "elbow"){
nDim <- pcaElbow(pca.prop$pc.prop_var, low = pca.elbow.low, max.pc = 0.9)
} else if (parameter.list$pca.component.select.method == "cum_var"){
nDim <- max(pca.prop$pc.id[pca.prop$pc.cum_sum% dplyr::filter(auc > 0.95)
# projectReduction(so, n.components = 10, show.n.features = 30)
```
```{r c19 - PCA}
pc.elbow_min_difference_ps <- 0.001
pc.cum_var_explained_threshold_ps <- pca.var.threshold
PC_number_method_ps <- 2
# calculate proportion of variance explained by each PC ---------------------
pc.std <- so@reductions[["pca"]]@stdev
pc.var <- pc.std^2
pc.prop_var <- pc.var/sum(pc.var)
pc.cum_sum <- cumsum(pc.prop_var)
pc.id <- c(1:ncol(so@reductions[["pca"]]))
scree.var <- data.frame(pc.id, pc.prop_var, pc.cum_sum)
# determine number of PCs included ------------------------------------------
pc.nPC_method_1 <- sum(abs(diff(pc.prop_var))>pc.elbow_min_difference_ps) +1
pc.nPC_method_2 <- sum(pc.cum_sum%
dplyr::group_by(cluster_membership, barcode_labels) %>%
tally() %>%
dplyr::mutate(freq = n / sum(n))
u_barcodes <- unique(df.all_barcodes$barcode_labels)
if (length(u_barcodes) > 1) {
# convert long to wide (cell type table)
df_for_wide_barcodes <- df.all_barcodes
colnames(df_for_wide_barcodes)[colnames(df_for_wide_barcodes) == "barcode_labels"] <- "predicted_labels"
df.all_barcodes_wide <- long2wide(df_for_wide_barcodes)
# plot cluster compositio by barcode
df.cluster_annotations_barcodes <- df.all_barcodes
colnames(df.cluster_annotations_barcodes)[colnames(df.cluster_annotations_barcodes) == "barcode_labels"] <- "predicted_labels"
plt.cluster_composition_barcodes <- plot.cluster_composition_alt(df.cluster_annotations_barcodes,
other_threshold = 0)
plt.cluster_composition_barcodes <- plt.cluster_composition_barcodes +
theme_miko(legend = T) +
ggthemes::scale_fill_ptol("Barcode") +
labs(title = "Cluster Composition", subtitle = "Stratified by Barcodes")
} else {
df_for_wide_barcodes <- data.frame()
df.all_barcodes_wide <- data.frame()
df.cluster_annotations_barcodes <- data.frame()
plt.cluster_composition_barcodes <- c()
}
if (parameter.list$print.inline) {
print(plt.cluster_composition_barcodes)
}
```
```{r c20 - cell cycle scoring, fig.width=14, fig.height = 4}
# get cell cycle genes
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
if (length(unique(parameter.list$organism.include)) > 1){
s.genes <- toupper(s.genes)
g2m.genes <- toupper(g2m.genes)
} else if (unique(parameter.list$organism.include) == "Mm"){
s.genes <- firstup(s.genes)
g2m.genes <- firstup(g2m.genes)
} else if (unique(parameter.list$organism.include) == "Hs"){
s.genes <- toupper(s.genes)
g2m.genes <- toupper(g2m.genes)
}
so <- ens2sym.so(so, gNames.list)
# cell cycle scoring and umap
plt.cc.umap <- tryCatch({
so <- CellCycleScoring(so, s.features = s.genes, g2m.features = g2m.genes, set.ident = F)
plt.cc.umap <- cluster.UMAP(
so,
pt.size = scMiko::autoPointSize(nrow(so), scale.factor = 1583),
group.by = "Phase",
x.label = "UMAP 1",
y.label = "UMAP 2",
plot.name = "Cell Cycle Classifications",
include.labels = F)
}, error = function(e){
plt.cc.umap <- NULL
return(plt.cc.umap)
})
# cluster-level composition
plt.cc <- NULL
if ("Phase" %in% names(so@meta.data)){
df.cc <- data.frame(clusters = so@meta.data[["seurat_clusters"]], phase = so@meta.data[["Phase"]])
df.cc.tally <- df.cc %>%
group_by(clusters, phase) %>%
tally()
plt.cc.compo <- df.cc.tally %>%
ggplot(aes(x = clusters, y = n, fill = phase)) +
geom_bar(position="fill", stat="identity") +
theme_miko(legend = T) +
labs(title = "Cluster Composition by Cell Cycle") +
xlab("Cluster ID") + ylab("Relative Frequency")
# plt.umap_by_cluster + theme(legend.position = "none"),
plt.cc <- cowplot::plot_grid(plt.cc.umap + ggthemes::scale_colour_tableau(),
plt.cc.compo + ggthemes::scale_fill_tableau(), nrow = 1, labels = c("F", "G"))
}
if (parameter.list$print.inline) {
print(plt.cc)
}
```
```{r c21 - variable gene list}
if (normalization_method_ps == "SCT"){
df.var.meta <- so@assays[["SCT"]]@meta.features
df.var.model <- so@assays[["SCT"]]@SCTModel.list[["model1"]]@feature.attributes
var.gene <- so@assays[["SCT"]]@var.features
df.var.model$ENSEMBLE <- df.var.meta$ENSEMBLE
df.var.model$SYMBOL <- df.var.meta$SYMBOL
so@assays[["SCT"]]@SCTModel.list[["model1"]]@feature.attributes <- df.var.model
df.var.meta <- df.var.model %>% dplyr::filter(SYMBOL %in% var.gene)
# df.var.meta <- df.var.meta[df.var.meta$sct.variable == T, ]
# df.var.meta <- bind_cols(data.frame(genes = rownames(df.var.meta)), df.var.meta)
df.var.meta <- df.var.meta %>% dplyr::arrange(-residual_variance)
which.round <- colnames(df.var.meta)[ !(colnames(df.var.meta) %in% c("ENSEMBLE", "SYMBOL", "genes_log_gmean_step1", "step1_theta", "step1_(intercept)", "step1_log_umi"))]
try({df.var.meta[which.round] <- signif(df.var.meta[which.round], 3)}, silent = T)
} else {
df.var.meta <- NULL
}
```
```{r c22 - run additional dimensional reductions}
#
#
# if (parameter.list$do.additional.reduction){
#
# # tsne #####################
#
# plt.tsne.combo <- tryCatch({
# so <-RunTSNE(so, dims = 1:nDim)
#
# # generate plots
# plt.tsne_by_cluster <- DimPlot(so, reduction = "tsne", label = TRUE) + ggtitle(label = "TSNE") +
# xlab("TSNE 1") + ylab("TSNE 2")
#
# plt.tsne <- DimPlot(so, reduction = "tsne",group.by = "Barcode") + ggtitle(label = "TSNE") +
# xlab("TSNE 1") + ylab("TSNE 2")
#
# # combine plots
# plt.tsne.combo <- cowplot::plot_grid(plt.tsne_by_cluster, plt.tsne, ncol = 2)
# }, error = function(e){
# plt.tsne.combo <- NULL
# return(plt.tsne.combo)
# })
#
#
# # PCA ########################
#
# # generate plots
# plt.pca_by_cluster <- DimPlot(so, reduction = "pca", label = TRUE) + ggtitle(label = "PCA") +
# xlab("PCA 1") + ylab("PCA 2")
#
# plt.pca <- DimPlot(so, reduction = "pca",group.by = "Barcode") + ggtitle(label = "PCA") +
# xlab("PCA 1") + ylab("PCA 2")
#
# # combine plots
# plt.pca.combo <- cowplot::plot_grid(plt.pca_by_cluster, plt.pca, ncol = 2)
#
# # ICA ########################
# so <- RunICA(so, verbose = F)
#
# # generate plots
# plt.ica_by_cluster <- DimPlot(so, reduction = "ica", label = TRUE) + ggtitle(label = "ICA") +
# xlab("ICA 1") + ylab("ICA 2")
#
# plt.ica <- DimPlot(so, reduction = "ica",group.by = "Barcode") + ggtitle(label = "ICA") +
# xlab("ICA 1") + ylab("ICA 2")
#
# # combine plots
# plt.ica.combo <- cowplot::plot_grid(plt.ica_by_cluster, plt.ica, ncol = 2)
#
# } else {
# plt.tsne.combo <- NULL
# plt.ica.combo <- NULL
# plt.pca.combo <- NULL
# }
#
#
# if (parameter.list$print.inline){
# print(plt.tsne.combo)
# print(plt.pca.combo)
# print(plt.ica.combo)
# }
```
```{r c23 - summary statistics, warning = FALSE}
meta.data <- names(so@meta.data)
# get subgroup names
quantifier.stem <- c("Count", "Feature", "percent.mt")
which.quant <- grepl(paste(quantifier.stem, collapse = "|"), meta.data)
quant.names <- meta.data[which.quant]
meta.names <- meta.data[!which.quant]
df.meta <- so@meta.data
summary.list <- list()
for (i in 1:length(meta.names)){
current.meta.name <- meta.names[i]
if (current.meta.name %in% c("unmatched.rate", "S.Score", "G2M.Score")) next
df.meta.summary <- NULL
df.meta.summary <- df.meta %>%
group_by(get(meta.names[i])) %>%
dplyr::summarise(n.nuclei =n(),
UMI.mean = round(mean(nCount_RNA)),
UMI.median = round(median(nCount_RNA)),
UMI.min = min(nCount_RNA),
UMI.max = max(nCount_RNA),
genes.mean = round(mean(nFeature_RNA)),
genes.median = round(median(nFeature_RNA)),
genes.min = min(nFeature_RNA),
genes.max = max(nFeature_RNA),
mito.mean = signif(mean(percent.mt),3) ,
mito.median = signif(median(percent.mt),3) ,
mito.min = signif(min(percent.mt),3) ,
mito.max = signif(max(percent.mt),3))
colnames(df.meta.summary)[1] <- meta.names[i]
if (dim(df.meta.summary)[1] == 1){
df.meta.summary <- data.frame(t(df.meta.summary)[2:dim(df.meta.summary)[2], ])
colnames(df.meta.summary)[1] <- meta.names[i]
} else {
df.meta.summary <- as.data.frame(t(df.meta.summary))
new.col.name <- as.matrix(df.meta.summary[1,])
df.meta.summary <- as.data.frame(df.meta.summary[2:nrow(df.meta.summary),])
colnames(df.meta.summary) <- new.col.name
}
try({
summary.list[[meta.names[i]]] <- df.meta.summary
}, silent = T)
}
```
```{r central log}
# update central log
run.id <- NULL
if (!exists("user")) user <- "guest"
clog.update.success <- F
if (parameter.list$update.log){
try({
run.id <- updateCentralLog(Module = "M01", input.data = which.data, input.subset = which.strata, pdf.flag = parameter.list$save.pdf)
clog.update.success <- T
}, silent = F)
}
if (is.null(run.id)) {
warning("Central log update was unsuccessful :(\n")
run.id <- paste("M01_", user, "_r", paste0(format(Sys.time(), '%s')), sep = "", collapse = "")
}
```
```{r analysis log}
# Normalization Method #
if (normalization_method_ps == "NFS"){
norm_method <- "normalize, find features, scale"
} else if (normalization_method_ps == "SCT"){
norm_method <- "SCT"
}
df.log <- addLogEntry("Normalization Method", norm_method, df.log, "norm_method")
df.log <- addLogEntry("PCA Method", parameter.list$pca.method, df.log, "pca.method")
df.log <- addLogEntry("Principal Components Included", nDim, df.log, "nDim")
df.log <- addLogEntry("Unmatched rate upper limit", unmatch.high, df.log, "unmatch.high")
df.log <- addLogEntry("Unmatched rate lower limit", unmatch.low, df.log, "unmatch.low")
df.log <- addLogEntry("nPCA components used", nDim, df.log, "nDim")
end.time <- proc.time()
elapsed.time <- round((end.time - start.time)[[3]], 2)
df.log <- addLogEntry("Run Time (s)", elapsed.time, df.log, "elapsed.time")
df.log <- addLogEntry("Run Identifier", run.id, df.log, "run.id")
df.log <- addLogEntry("User", user, df.log, "user")
df.log <- addLogEntry("Central Log Updated", clog.update.success, df.log, "clog.update.success")
```
```{r output path and save results}
# output path
if (!exists("data.path")) data.path = ""
output.path <- paste0(data.path, "Module_Outputs/", paste0(run.id,"_",format(Sys.time(), '%d%m%y'), "/"))
# create output directories
dir.create(output.path)
dir.create(paste0(output.path, "Tables/"))
if (parameter.list$save.pdf) dir.create(paste0(output.path, "PDF/"))
parameter.list$save.filename <- paste0(run.id, "_", parameter.list$save.filename)
df.log <- addLogEntry("Output Results (.Rdata)", as.character(parameter.list$save.filename), df.log, "save.filename")
df.log <- addLogEntry("Output Directory", as.character(parameter.list$save.directory), df.log, "save.directory")
df.log <- addLogEntry("Output Saved", as.character(parameter.list$save.flag), df.log, "save.flag")
# save results to Preprocessed_Datasets/
save.filename <- paste0(data.path, parameter.list$save.directory , parameter.list$save.filename)
df.log_Module_1 <- df.log
if ((parameter.list$save.flag == TRUE) ){
save(so, gNames.list, df.log_Module_1, file = save.filename)
}
```
QC Overview
=====================================
Sidebar {.sidebar }
-------------------------------------
**Description**: scRNAseq preprocessing and quality check\
**Figure Legends**:\
**A|** Distribution of gene counts, transcript counts, and mitochondrial percentage per cell.\
**B|** Relationship between QC measures. *Color* scale represents cell density.\
**C|** Distribution of mitochondial percentage per cell, stratified by samples (Barcodes).\
**D|** Number of cells pre- and post- filtering step. Percentage of omitted cells are indicated.\
**E|** Relationship between pre and post-normalized transcript and gene counts.\
**F|** Distribution of pre and post-normalized transcript counts.\
**G|** Distribution of pre and post-normalized gene counts.\
**H|** Variable gene plot.\
**Notes**:\
*For A-C*, all cells are shown (i.e., pre-filtering step)\
*For E-H*, subset of filtered cells are shown (i.e., post-filtering step)\
**Definitions**:\
**nFeature**: Number of genes.\
**nCount**: Number of transcripts.\
**percent.mt**: Percentage of mitochondrial transcripts.\
**Barcode**: Sample identifier.\
Row
-----------------------------------------------------------------------
### A| QC distributions
```{r plt.QC_violin, fig.height= 5, fig.width = 10}
print(plt.QC_violin)
savePDF(file.name = paste0(output.path, "PDF/", "M01_QC_violin.pdf"), plot.handle = plt.QC_violin, fig.width = 10, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### B| Pairwise QC relationships
```{r plt.QC_scatter, fig.height= 5, fig.width = 12}
print(plt.QC_scatter)
savePDF(file.name = paste0(output.path, "PDF/", "M01_QC_scatter.pdf"), plot.handle = plt.QC_scatter, fig.width = 12, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### C| Mitochondrial content distribution
```{r mt.mito_by_bc}
print(plt.mito_by_bc + theme_miko(legend = T))
savePDF(file.name = paste0(output.path, "PDF/", "M01_mito_by_bc.pdf"),
plot.handle = plt.mito_by_bc + theme_miko(legend = T), fig.width = 8, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### D| Cells Recovered: Pre- and Post-Filtering
```{r plt.filter_pre_post}
print(plt.filter_pre_post + theme_miko(legend = T))
savePDF(file.name = paste0(output.path, "PDF/", "M01_filter_pre_post.pdf"), plot.handle = plt.filter_pre_post + theme_miko(legend = T),
fig.width = 5, fig.height = 5, save.flag = parameter.list$save.pdf)
```
Row
-----------------------------------------------------------------------
### E| Pre- vs Post-Normalization Statistics
```{r plt.pre_post_norm}
print(plt.pre_post_norm)
savePDF(file.name = paste0(output.path, "PDF/", "M01_pre_post_norm.pdf"), plot.handle = plt.pre_post_norm,
fig.width = 5, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### F| Transcripts per cell
```{r plt.UMI_per_cell}
print(plt.UMI_per_cell + theme_miko(legend = T))
savePDF(file.name = paste0(output.path, "PDF/", "M01_transcripts_per_cell.pdf"), plot.handle = plt.UMI_per_cell + theme_miko(legend = T),
fig.width = 5, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### G| Genes per cell
```{r plt.genes_per_cell}
print(plt.genes_per_cell + theme_miko(legend = T))
savePDF(file.name = paste0(output.path, "PDF/", "M01_genes_per_cell.pdf"), plot.handle = plt.genes_per_cell + theme_miko(legend = T),
fig.width = 5, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### H| Variable gene plot
```{r plt.variable_genes}
print(plt.variable_genes)
savePDF(file.name = paste0(output.path, "PDF/", "M01_variable_genes.pdf"), plot.handle = plt.variable_genes,
fig.width = 5, fig.height = 5, save.flag = parameter.list$save.pdf)
```
Row
-----------------------------------------------------------------------
### Cells Recovered
```{r valuebox1}
valueBox(ncol(so))
```
### Genes per Cell (Median)
```{r valuebox2}
valueBox(median(so$nFeature_RNA))
```
### Transcripts per Cell (Median)
```{r valuebox3}
valueBox(median(so$nCount_RNA))
```
Filter and PCA Summary
=====================================
Sidebar {.sidebar }
-------------------------------------
**Description**: QC rank plots and filtering cutoffs, along with PCA summary.
**Figure Legends**:\
**A|** Transcript vs. gene count relationship.\
**B|** Transcript count per cell ranking.\
**C|** Gene count per cell ranking.\
**D|** Mitochondrial content per cell ranking.\
**E|** Venn diagram of filtered cells, grouped by filtering criteria.\
**F|** PCA scree plot showing proportion of variance explained by each principal component.\
**G|** Cumulative variance explained by principal components.\
**H|** Artifact gene detection (Lause 2021)
Row {.tabset}
-------------------------------------
### Filtering Statistics
```{r plt QC ranks mT, fig.width = 15, fig.height = 8}
plt.QC.stat <- cowplot::plot_grid(cowplot::plot_grid(plt.gene.umi.mito, plt.umi.rank.mito, plt.gene.rank.mito, plt.mt.rank.mito, ncol = 2, labels = c("A", "B", "C", "D"), align = "h"), plt.filter.venn, labels = c("", "E"))
print(plt.QC.stat)
savePDF(file.name = paste0(output.path, "PDF/", "M01_filtering_statistics.pdf"),
plot.handle = plt.QC.stat,
fig.width = 15, fig.height = 8, save.flag = parameter.list$save.pdf)
```
### PCA Scree Plots
```{r plt.PCA, fig.width=10, fig.height = 4}
cowplot::plot_grid(plt.scree1, plt.scree2, ncol=2, labels= c("F", "G"))
savePDF(file.name = paste0(output.path, "PDF/", "M01_scree.pdf"), plot.handle = cowplot::plot_grid(plt.scree1, plt.scree2, ncol=2),
fig.width = 8, fig.height = 4, save.flag = parameter.list$save.pdf)
```
### Artifact Genes
```{r artifact gene plot, fig.height=7, fig.width=8}
try({
plt.ag <- cowplot::plot_grid(plt.ag, labels = "H")
print(plt.ag)
savePDF(file.name = paste0(output.path, "PDF/", "M01_artifact_genes.pdf"), plot.handle = plt.ag,
fig.height=7, fig.width=8, save.flag = parameter.list$save.pdf)
}, silent =T)
```
```{r plt QC ranks uR, fig.width=12, fig.height = 7}
### QC (unmatched rate)
# try({
# cowplot::plot_grid(plt.umi.rank.match, plt.gene.rank.match, plt.mt.rank.match,
# plt.unmatch.umi, plt.gene.umi.match, plt.umatch.rank,ncol = 3, nrow = 2)
#
# }, silent = T)
```
```{r}
# try({
# savePDF(file.name = paste0(output.path, "PDF/", "M01_QC_unmatchedRate_ranks.pdf"),
# plot.handle = cowplot::plot_grid(plt.umi.rank.match, plt.gene.rank.match, plt.mt.rank.match,
# plt.unmatch.umi, plt.gene.umi.match, plt.umatch.rank,ncol = 3, nrow = 2) ,
# fig.width = 17, fig.height = 10, save.flag = save.pdf)
#
# }, silent = T)
```
UMAP
=====================================
Sidebar {.sidebar }
-------------------------------------
**Description**: Uniform Manifold Approximation and Projection (UMAP) plots with overlaid cluster, sample, QC and cell cycle information.
**Figure Legends**:\
**A|** Cell cluster UMAP.\
**B|** Sample UMAP.\
**C|** Transcript count (n/cell) UMAP.\
**D|** Gene count (n/cell) UMAP.\
**E|** Mitochondrial content (%/cell) UMAP.\
**F|** Cell cycle state UMAP.\
**G|** Barplot of Relative abundances of cells in each cell cycle state, stratified by cell cluster.\
Row {.tabset}
-----------------------------------------------------------------------
### Cell Clusters
```{r, fig.width=12, fig.height=5}
print(plt.umap.combo)
savePDF(file.name =paste0(output.path, "PDF/", "M01_umap.pdf"), plot.handle = plt.umap.combo,
fig.width = 12, fig.height = 5, save.flag = parameter.list$save.pdf)
```
### QC Statistics
```{r UMAP QC plots, fig.width=15, fig.height=6}
plt.umap.qc
savePDF(file.name = paste0(output.path, "PDF/", "M01_QC_umap.pdf"), plot.handle = plt.umap.qc,
fig.width = 12, fig.height = 6, save.flag = parameter.list$save.pdf)
```
### Cell Cycle
```{r plot cc plot, fig.width=12, fig.height=5}
try({
print(plt.cc)
savePDF(file.name = paste0(output.path, "PDF/", "M01_cell_cycle.pdf"), plot.handle = plt.cc,
fig.width = 14, fig.height = 4, save.flag = parameter.list$save.pdf)
}, silent = T)
```
```{r, fig.width=12, fig.height=5}
### TSNE
# print(plt.tsne.combo)
# if (!is.null(plt.tsne.combo)){
# savePDF(file.name = paste0(output.path, "PDF/", "M01_tsne.pdf"), plot.handle = plt.tsne.combo,
# fig.width = 12, fig.height = 5, save.flag = save.pdf)
# }
```
```{r, fig.width=12, fig.height=5}
### PCA
# print(plt.pca.combo)
#
# if (!is.null(plt.pca.combo)){
# savePDF(file.name = paste0(output.path, "PDF/", "M01_pca.pdf"), plot.handle = plt.pca.combo,
# fig.width = 12, fig.height = 5, save.flag = save.pdf)
# }
```
```{r, fig.width=12, fig.height=5}
### ICA
# print(plt.ica.combo)
#
# if (!is.null(plt.ica.combo)){
# savePDF(file.name = paste0(output.path, "PDF/", "M01_ica.pdf"), plot.handle = plt.ica.combo,
# fig.width = 12, fig.height = 5, save.flag = save.pdf)
# }
```
```{r}
# report.subgroup <- length(unique(so@meta.data[["subset_group"]]))> 1
# a1 <- knitr::knit_expand(text = sprintf("\n%s", "QC by subgroup"))
# a2 <- knitr::knit_expand(text = "\n=====================================")
# out1 <- paste(a1, a2, collapse = '\n')
# `r if (report.subgroup){ paste(knitr::knit(text = paste(out1, collapse = '\n'))) }`
```
```{r}
# a1 <- knitr::knit_expand(text = sprintf("\n%s", "TF Network"))
# a2 <- knitr::knit_expand(text = sprintf("\n%s\n", "====================================="))
# a3 <- knitr::knit_expand(text = sprintf("\n%s", "Sidebar {.sidebar}"))
# a4 <- knitr::knit_expand(text = sprintf("\n%s\n", "-------------------------------------"))
# a5 <- knitr::knit_expand(text = "\n**QC statistics by subgroup**\n\n**Description**: Data are stratified by subgroup, and QC statistics are shown.")
#
# out2 <- paste(a3, a4,a5, collapse = '\n')
# `r if (report.subgroup){ paste(knitr::knit(text = paste(out2, collapse = '\n'))) }`
```
```{r scale free}
# r1 <- knitr::knit_expand(text = sprintf("\n%s", "Row {.tabset}"))
# r2 <- knitr::knit_expand(text = sprintf("\n%s\n", "-------------------------------------"))
#
# row.chunk <- paste(r1, r2,collapse = '\n')
#
# a1 <- knitr::knit_expand(text = sprintf("### %s\n", "QC distributions")) # tab header
# a2 <- knitr::knit_expand(text = sprintf("\n```{r %s, message=FALSE, warning=FALSE, fig.width = 12, fig.height=5}", #fig.width = 8, fig.height=8,
# paste("plt.violin_by_subgroup", 1, sep = "")))
# a3 <- knitr::knit_expand(text = sprintf("\n %s", "print(plt.violin_by_subgroup)"))
# a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
#
# a1.chunk <- paste(a1, a2, a3, a4, collapse = '\n')
#
# a1 <- knitr::knit_expand(text = sprintf("### %s\n", "QC table")) # tab header
# a2 <- knitr::knit_expand(text = sprintf("\n```{r %s, message=FALSE, warning=FALSE, fig.width = 10, fig.height=5}", #fig.width = 8, fig.height=8,
# paste("df.qc_sum table", 1, sep = "")))
# a3 <- knitr::knit_expand(text = sprintf("\n %s", "knitr::kable(df.qc_sum)"))
# a4 <- knitr::knit_expand(text = "\n```\n") # end r chunk
#
# a2.chunk <- paste(a1, a2, a3, a4, collapse = '\n')
#
# try({
# savePDF(file.name = paste0(output.path, "PDF/", "general_scaleFree_parameterOptimization.pdf"), plot.handle = plt.sft,
# fig.width = 10, fig.height = 5, save.flag = parameter.list$save.pdf)
# }, silent = T)
# try({
# savePDF(file.name = paste0(output.path, "PDF/", "tf_scaleFree_parameterOptimization.pdf"), plot.handle = plt.sft.tf,
# fig.width = 10, fig.height = 5, save.flag = parameter.list$save.pdf)
# }, silent = T)
#
# `r if (report.subgroup){ paste(knitr::knit(text = paste(row.chunk, collapse = '\n'))) }`
#
# `r if (report.subgroup){paste(knitr::knit(text = paste(a1.chunk, collapse = '\n')))}`
#
# `r if (report.subgroup){paste(knitr::knit(text = paste(a2.chunk, collapse = '\n')))}`
#
#
#
#
# QC by subgroups (pre-filtering)
# =====================================
#
# Row {data-height=700}
# -----------------------------------------------------------------------
#
# ```{r plt.violin_by_subgroup, fig.height= 5, fig.width = 12}
#
# print(plt.violin_by_subgroup)
#
# savePDF(file.name = paste0(output.path, "PDF/", "M01_violin_by_subgroup.pdf"), plot.handle = plt.violin_by_subgroup,
# fig.width = 12, fig.height = 5, save.flag = parameter.list$save.pdf)
#
# ```
#
# Row {.tabset}
# -----------------------------------------------------------------------
#
# ### QC summary
#
# ```{r df.qc_sum}
# knitr::kable(df.qc_sum)
# ```
#
# ### nFeature_RNA ANOVA
#
# ```{r nFeat_aov}
# knitr::kable(nFeat_aov[[1]])
# ```
#
# ### nCount_RNA ANOVA
#
# ```{r nCount_aov}
# knitr::kable(nCount_aov[[1]])
# ```
#
# ### percent.mt ANOVA
#
# ```{r mT_aov}
# knitr::kable(mT_aov[[1]])
# ```
#
#
```
```{r plt.cluster_composition_barcodes, fig.width=13, fig.height=4}
# Cluster Composition
# =====================================
# try({
# print(plt.cluster_composition_barcodes)
# savePDF(file.name = paste0(output.path, "PDF/", "M01_cluster_composition_barplot.pdf"), plot.handle = plt.cluster_composition_barcodes,
# fig.width = 13, fig.height = 4, save.flag = parameter.list$save.pdf)
# }, silent = T)
```
QC Tables
=====================================
Sidebar {.sidebar }
-------------------------------------
**Description**:\
QC statistics stratified by factors of potential interest, including cell cluster and cell cycle.
**Definitions**\
**n.nuclei**: number of nuclei/cells\
**UMI.mean**: mean transcript count (n/cell)\
**UMI.median**: median transcript count (n/cell)\
**UMI.min**: minimum transcript count (n/cell)\
**UMI.max**: maximum transcript count (n/cell)\
**gene.mean**: mean gene count (n/cell)\
**gene.median**: median gene count (n/cell)\
**gene.min**: minimum gene count (n/cell)\
**gene.max**: maximum gene count (n/cell)\
**mito.mean**: mean mitochondrial content (%/cell)\
**mito.median**: median mitochondrial content (%/cell)\
**mito.min**: minimum mitochondrial content (%/cell)\
**mito.max**: maximum mitochondrial content (%/cell)\
Row {.tabset}
-------------------------------------
```{r summary statistis, echo = FALSE, eval = TRUE, message=FALSE, warning=FALSE}
out <- flex.multiTabTables(summary.list, "summary.list")
```
`r paste(knitr::knit(text = paste(out, collapse = '\n')))`
Variable Genes Table
=====================================
Sidebar {.sidebar }
-------------------------------------
**Description**:\
Results from SCtransform workflow. In brief, variance stabilizing transformation to UMI count data was applied using regularized Negative Binomial regression model to remove unwanted effects from UMI data. See Seurat::sctransform() for details. \
**Definitions**\
gmean: geometric mean\
variance: expression variance\
genes_log_gmean_step1: log-geometric mean of genes used in initial step of parameter estimation\
**Citation**:\
Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20, 296 (December 23, 2019). https://doi.org/10.1186/s13059-019-1874-1
Row
-------------------------------------
```{r var table list}
if (!is.null(df.var.meta)){
flex.asDT(df.var.meta)
}
```
```{r save variable gene table}
if (!is.null(df.var.meta)){
if (parameter.list$save.pdf){
write.csv(df.var.meta, file = paste0(output.path, "Tables/", "variableGenes.csv"),
row.names = F)
}
}
```
Log
=====================================
```{r}
knitr::kable(df.log_Module_1)
```
```{r save analysis log as csv}
try({
write.csv(df.log_Module_1, file = paste0(output.path, "Tables/", "analysisLog.csv"),
row.names = F)
}, silent = T)
```
```{r merge pdfs, include = FALSE}
# combine pdfs into single binder
# if (parameter.list$save.pdf){
# try({
# pdf.list <- list.files (path = paste0(output.path, "PDF/") )
# pdf.list <- paste0( paste0(output.path, "PDF/"), pdf.list[grepl(".pdf", pdf.list)])
#
# pdftools::pdf_combine(pdf.list, output = paste0(output.path, "PDF/merged_binder.pdf"))
# }, silent = T)
# }
```